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For a general sensory system following an external stochastic signal, we introduce the sensory 
capacity. This quantity characterizes the performance of a sensor: sensory capacity is maximal if 
the instantaneous state of the sensor has as much information about a signal as the whole time-series 
of the sensor. We show that adding a memory to the sensor increases the sensory capacity. This 
increase quantifies the improvement of the sensor with the addition of the memory. Our results are 
obtained with the framework of stochastic thermodynamics of bipartite systems, which allows for 
the definition of an efficiency that relates the rate with which the sensor learns about the signal 
with the energy dissipated by the sensor, which is given by the thermodynamic entropy production. 

We demonstrate a general tradeoff between sensory capacity and efficiency: if the sensory capacity 
is equal to its maximum 1, then the efficiency must be less than 1/2. As a physical realization 
of a sensor we consider a two component cellular network estimating a fluctuating external ligand 
concentration as signal. This model leads to coupled linear Langevin equations that allow us to 
obtain explicit analytical results. 


I. INTRODUCTION 

The relation between information and thermodynam¬ 
ics is a very active topic, as reviewed in [1]. Prominently, 
developments in this field lead to a better understanding 
of fundamental limits related to dissipation in a computer 
and of cellular information processing. Much of the re¬ 
newed interest in this relation between information and 
thermodynamics is associated with the fact that recent 
experiments with small systems verify fundamental rela¬ 
tions like the Landauer limit for the erasure of a bit [2, 3] 
and the conversion of information into work [4-6] . Theo¬ 
retical advances in the field include second law inequali¬ 
ties and fluctuation relations containing an informational 
term [7-29] , generalization of thermodynamics to include 
information reservoirs [30-39], stochastic thermodynam¬ 
ics of bipartite systems [40-46] , and the relation between 
dissipation and information in biological systems [47-59] . 

A sensor that learns about (or “measures”) an external 
stochastic signal constitutes a fundamental setup within 
thermodynamics of information processing. In this case 
energy is dissipated and the sensor obtains information 
about the external signal, in contrast to a Maxwell’s de¬ 
mon, which is another fundamental setup, where infor¬ 
mation is used to extract work. 

General results for the thermodynamics of a sensor 
have been obtained by Still et al. [60]. They have shown 
that an entropy characterizing how much information the 
sensor obtains about the external signal is bounded by 
the dissipated heat. Similarly, we have shown that an 
entropic rate, dubbed learning rate, is bounded by the 
thermodynamic entropy production in bipartite systems 
[55] , which allowed for the definition of a thermodynamic 
efficiency for models related to cellular information pro¬ 
cessing. 

In this paper, using bipartite Markov processes we in¬ 
troduce the sensory capacity, an informational efficacy 
parameter characterizing the performance of a sensor. 
This quantity is dehned as the learning rate divided 


by the transfer entropy rate, where the latter quantifies 
how much information the full time series of the sensor 
has about the signal. Sensory capacity is positive and 
bounded by 1. The limit 1 is reached if the information 
contained in the instantaneous state of the sensor equals 
the information contained in the whole time-series of the 
sensor, which is the maximum information the sensor can 
have about the signal. 

A bare sensor, i.e., a sensor with only one degree of 
freedom, is compared to a sensor that contains a mem¬ 
ory, which is a second degree of freedom. We show that 
the addition of a memory to a bare sensor can increase 
the sensory capacity. This increase in sensory capacity 
quantifies how much of the information contained in the 
time-series of the bare sensor is stored in the instanta¬ 
neous state of the memory. 

Our results are obtained with coupled linear Langevin 
equations that constitute a simple example of a bipar¬ 
tite system. These linear Langevin equations are derived 
from a discrete model for a two component cellular net¬ 
work estimating an external ligand concentration, which 
is the signal. The two components of the network are re¬ 
ceptors that can bind external ligands and internal pro¬ 
teins that play the role of memory [48, 53, 54, 58, 61]. 
This derivation starting with a physical model for a sen¬ 
sor allows us to provide a clear physical interpretation 
for the parameters showing up in the Langevin equations 
and for the thermodynamic entropy production. 

The relation between sensory capacity and energy dis¬ 
sipation is also discussed. Particularly, as a main result 
we show that if the sensory capacity is 1, the efficiency 
relating learning rate and rate of dissipation must be 
smaller than 1/2. This result is valid for any bipartite 
process. The specific tradeoff between sensory capacity 
and efficiency for the coupled linear Langevin equations 
is analyzed in detail. 

The paper in organized as follows. In Sec. II we define 
discrete bipartite processes and the quantities calculated 
in the paper. Sec. Ill contains the derivation of the 
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coupled linear Langevin equations from the microscopic 
model for a two component network. The analysis of the 
Langevin equations is performed in Sec. IV. The general 
tradeoff between sensory capacity and efficiency is de¬ 
rived in Sec. V. We conclude in Sec. VI. The continuum 
limit from a master equation to a Langevin equation in 
bipartite systems is presented in Appendix A. The un¬ 
certainty about the signal given the sensor state and the 
uncertainty given the sensor trajectory are calculated in 
Appendix B. 

II. BIPARTITE MARKOV PROCESSES AND 
SENSORY CAPACITY 


Shannon entropy of A given another random variable B 
is 

= -J^ V(A = a,B = b) lnr{A = a\B = b). 

a,b 

( 3 ) 

The mutual information between A and B reads 

I[A:B] = H[A] - H[A\B] = H[B] - H[B\A], (4) 

where the second equality indicates that the mutual in¬ 
formation is symmetric in the variables A and B. 

B. Learning rate 


A. Definition of bipartite systems 


The learning rate is defined as [55] 


A state of the signal is denoted by x and a state of 
the sensor by y. We consider a quite general framework, 
where the basic assumptions are that the dynamics of 
the full system composed by the signal and the sensor is 
Markovian, the dynamics of the signal is not affected by 
the sensor whereas the dynamics of the sensor is affected 
by the signal, and the signal alone is also Markovian. 
For a Markov jump process these assumptions imply the 
following transition rates from a state {x^y) to a state 

{x’,y'), 


w 


xx' _ 

yy' — 


{wxx' 

^yy' 

0 


A X =/= x' and y = y', 

A X = x' and y ^ y', (1) 

A X ^ x' and y ^ y'. 


, _ H[xt\yt] - H[xt\yt+dt] 

h = 

where here and in the following in all expressions that 
involve a dt in the denominator the limit dt —>■ 0 is as¬ 
sumed. The learning rate quantifies the rate at which the 
sensor acquires information about the instantaneous sig¬ 
nal state Xt, i.e., the rate at which the sensor reduces the 
uncertainty (as characterized by the conditional Shan¬ 
non entropy) of the signal due to its dynamics [55]. The 
learning rate can also be written in terms of mutual in¬ 
formation 


I 


y ~ 


I[xt:yt+dt] - Ijxt-yt] 

dt 


( 6 ) 


Such a Markov process, for which the two variables la¬ 
beling a state cannot both change in a jump, is called 
bipartite [41]. The rates (1) correspond to a partic¬ 
ular case of a bipartite process since is indepen¬ 

dent of y. For bipartite systems in a steady state, 
which is the regime we consider in this paper, the sta¬ 
tionary probability of state {x,y) is written as P(x,y). 
The marginals of this joint probability are defined as 
P{x) = Y.y y) and P{y) = Y.x Pi^^ v)- The station¬ 
ary conditional probabilities read P{x\y) = P{x,y)/P{y) 
and P{y\x) = P{x,y)/P{x). 

Key quantities in this paper are Shannon entropy and 
mutual information. The Shannon entropy associated 
with a random variable A is 

H[A] = -J2Pi^ = a)lnV{A = a) ( 2 ) 

a 

where a is a specific realization of A and P denotes a 
generic probability. The random variables A can be the 
instantaneous state of the signal Xt or of the sensor yt- 
Furthermore, A can be a full time series of the signal 
or of the sensor {yt'}t'<t- In the first case, the 
sum in a in Eq. (2) is a sum over all possible states. 
In the second case, this sum corresponds to a functional 
integration over all possible trajectories. The conditional 


which is the rate at which the y jumps increase the mu¬ 
tual information between the sensor y and the signal x. 
This form of the learning rate is also known as “informa¬ 
tion flow” [40, 44, 45]. Using the relations 


V{xt+dt = x'\xt 
P{yt+dt = y'\xt = x,yt 


x) = dt for X ^ x', 

y) = wly.dt for y^y' 


the learning rate (5) becomes 


ly= P{x,y)wly,Ai 

x,y,y' 


P{x\y') 
P{x\y) ■ 


( 8 ) 


In the steady state the learning rate is equal to the rate 
of Shannon entropy reduction of x due to its coupling 
with y, which is defined as [42] 


I, _ H[xt+dt\yt] - H[xt\yt] 

fl - T1-. -r-r , 


(9) 


This conservation law comes from the relation 
^H[x\y] = — ly = 0 [55], where is the con¬ 

tribution due to the x jumps, i.e.. 


h 


X 


^ P{x,y)w^^' 

x,x' ,y 


P{x 

Pix'\y)' 


( 10 ) 
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FIG. 1. (Color online) Learning rate versus transfer entropy 
rate. The learning rate takes into account only the instan¬ 
taneous state Xt (dashed green box) to infer the signal xt, 
whereas the transfer entropy 75t-»y takes into accout the tra¬ 
jectory highlighted by the blue shaded region. 


Since in the stationary state H[yt+dt] = H[yt]^ the learn¬ 
ing rate can also be written in the form 


ly - /ix 


Ijxt-yt] - I[xt+dt-yt] 
dt 


( 11 ) 


This expression is similar to the one used in [60], where 
within a discrete time formalism the term I[xt+dt-yt\ is 
identified as “predictive power”. 


C. Sensory capacity and transfer entropy rate 

Transfer entropy is an informational quantity that de¬ 
tects causal influence between two random variables [62] . 
It plays an important role in the relation between infor¬ 
mation thermodynamics for causal networks [19], bipar¬ 
tite systems [40, 42, 45], and feedback driven systems 
[18]. The transfer entropy rate from the signal to the 
sensor 7)c-).y is defined as [42] 

n- - H[yt+dt\{yt'}t'<t] - H[yt+dt\{yt'}t'<t,Xt] 

- dt 

_ l[yt+dt-xt\{yt'}t'<t] 

dt 

^ H[xt\{yt'}t'<t] - H[xt\yt+dt,{yt'}t'<t] ,^ 2 ) 
dt 

In the third line the similarity with the learning rate (5) 
is explicit: the transfer entropy rate 7i->.y quantifies how 
much information the whole sensor trajectory {yt'}t'<t 
contains about the instantaneous signal a;*, in contrast 
to the learning rate that considers only the instantaneous 
state yt- This difference between the learning rate ly and 
the transfer entropy rate T^^y is illustrated in Fig. 1. 
The first line of Eq. (12) contains the standard defi¬ 
nition of transfer entropy from the signal to the sensor 
[62] , which can be described as the reduction on the con¬ 
ditional Shannon entropy of yt+dt given {yt'}t'<t by the 
further knowledge of the signal state Xt ■ 

As shown in [42] ly < 7jc->.y, which simply means that 
the whole trajectory of the sensor {yt'}t'<t contains more 
information about the instantaneous signal xt than the 
instantaneous state of the sensor yt- Based on this in¬ 
equality we propose the definition 

C=^<1 (13) 

/x-fy 


that we call sensory capacity. If C = 1 the sensor has 
reached an information theoretical limit and its instan¬ 
taneous state has the maximum possible information, 
which is the information contained in the whole time se¬ 
ries of the sensor. On a side note, as a result related 
to the fact that the full time series of a sensor contains 
more information about the signal than its instantaneous 
state, it has been shown that an information driven ma¬ 
chine using the whole history of measurements can ex¬ 
tract more work than a machine that only takes the last 
measurement into account [22, 27]. This increase in work 
extraction is characterized by a gain parameter that, like 
the sensory capacity, is positive and bounded by one. 


D. Thermodynamic entropy production and 
efficiency 


The thermodynamic entropy production [63] for bipar¬ 
tite processes has two contributions. One is due to jumps 
that change the state of the signal. 


CTx 


^P{x)w^^' 

x,x' 


In • 


(14) 


If the bare signal is an equilibrium process, which is the 
case for the examples considered in this paper, tJx = 0. 
The second contribution arises from jumps that change 
the state of the sensor, which reads 


(T 


y — 


H Pix^y)Wyy' 

x,y,y' 


In ■ 


W7. 


(15) 


The inequality ly < Oy leads to the efficiency [55] 

??= —<1- (16) 

This efficiency relates the rate at which the sensor learns 
about the signal with the rate of free energy dissipation, 
which is quantified by the thermodynamic entropy pro¬ 
duction. For the model system in Sec. Ill, the entropy 
production has two terms. One is related to work done by 
the external signal and another to free energy dissipation 
inside the cell. 


E. Upper bound on the transfer entropy, 
coarse-grained entropy production and 
coarse-grained learning rate 


We now recall the definition of further quantities that 
will be calculated in this paper. The first quantity is an 
upper bound on the transfer entropy rate 


^ _ H[yt+dt\yt] - H[yt+dt\yt,xt] ^ ^ 

/ x-).y — ^ 'x-)-y 


(17) 
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An important property of this upper bound is that, unlike 
the transfer entropy rate, it can be written in terms of 
the stationary distribution as [42] 


T^x-fy— ^ ' P{x,y) 

x,y,v' 


, ^yy' 

Wy , 

Wyy, 


(18) 





where 

"^vv' = '^P{Ay)wly,. 

X 


FIG. 2. (Color online) Illustration of the causal relation x —>■ 
r ^ m for a sensor y = (r, m) composed of the first layer r 
(19) and the memory m. 


The inequality Tx->.y > is obtained by 

comparing Eq. (12) with Eq. (17), and us¬ 
ing relations H[yt+dt\yt] > H[yt+dt\{yt'}t'<t] and 
H[yt+dt\{yt'}t'<t,xt] = H[yt+dt\yt,xt]. 

The coarse grained entropy production is obtained by 
integrating the variable x out, leading to the expression 
[64] 


Transition rates with three variables that do not change 
simultaneously in a jump, as in Eq. (22), form a tripar¬ 
tite system, which is a particular case of a multipartite 
Markov process [46]. The transfer entropy in this case 
fulfills the relation 

7jc—>y = j-rj (24) 


dy = ^ P{y)Wyy, In ^ > 0. (20) 

Wy'y 

yy ^ ^ 

This dy is a lower bound on the real entropy production, 
i.e., CTy > dy [64]. 


F. Sensor with a memory 


We now consider a sensor with two degrees of freedom 
y = (r, m). We assume that r is the first degree of free¬ 
dom directly sensing the signal x and m is a memory 
storing the information collected by r (see [58] for a sim¬ 
ilar setup). The coarse-grained learning rate is defined 
as [55] 


, _ H[xt\rt] - H[xt\rt+dt] 

^ “ dr 

D/ N a; 1 P{x\r') 

= In , 

'r. r r' m '' ' ' 


( 21 ) 


where w^rm){r'm) denotes the transition rate from 
{x,r,m) to {x,P,m). The rate at which r alone learns 
about the signal x is quantified by < ly [55]. The 
transition rates then have the form 


where 

7 " = 77[rt+dt|{rt'}*'<«] - H[rt+dt\{rt'}t'<t,xt] ^^ 5 ) 

Relation (24) means that the transfer entropy from the 
signal X to the sensor y = (r, m) is equal to the transfer 
entropy from x to the first layer of the sensor r. This 
relation is a consequence of the causal relation x —> r —>■ 
m and can be demonstrated as follows. 

By defining Zt = {xt,rt,mt) the conditional probabil¬ 
ity V{zt+dt\zt) can be written as 

P{zt+dt\zt) = P{xt+dt\xt)P{rt+dt\xt,rt)'P{mt+dt\rt,mt), 

(26) 

which follows from the structure of the rates in Eq. (22). 
Erom the definition of the conditional Shannon entropy 
(3), Eq. (26) implies the following relations 

H[zt+dt\zt] = 

H[xt+dt\xt] + H[rt+dt\xt,rt] + H[mt+dt\rt,mt], (27) 
and 

H[yt+dt\yt,xt] = H[rt+dt,rnt+dt\rt,mt,xt] 

= H[rt+dt\rt,Xt] + H[mt+dt\rt,rnt]. 

(28) 


if X x' and y = y', 
if X = x', r 7 ^ r' and m = m', 
if X = x', r = r' and m 7 ^ m', 
otherwise, 

( 22 ) 

where y' = {r',m'). The transitions rates (22) imply the 
causal relation x ^ r ^ m, which is illustrated in Fig. 
2. Therefore, the coarse grained learning rate in Eq. (21) 
becomes 


{Wxx’ 

0 




P(x, r)w^,,/In 


P{x\r') 

P(xlr) 


(23) 


For large time t, the Markov property P(zt+dt\zt) = 
Vizt+dtlizt’jt'^t) and (26) lead to 


P[?/t+dt|{ 2 /t'}t'<t] = H[rt+dt,mt+dt\{rt'}t'<t,{'m‘t'}t'<t] 
= P[r(+dtl{rt/}(/<(] -I- H[mt+dt\mt, n]. (29) 


Finally, from Eqs. (28) and (29) we obtain the transfer 
entropy rate ( 12 ) in the form 


%^y — 


H[yt+dt\{yt'}t'<t] - H[yt+dt\yt,xt] 

dt 


H[rt+dt\{rt'}t'<t] - H[rtd.dt\rt,Xt] 

dt 


(30) 
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FIG. 3. (Color online) Cellular two-competent network sens¬ 
ing an external ligand concentration. The total number of 
receptors is = 7 and the number of occupied receptors is 
rib = 3. The number of internal proteins, which constitute 
the memory, is Ny = 10 with riy = 4 of them phosphorylated. 
The number of occupied receptors affects the transition rates 
related to the phosphorylation of internal proteins. 


which after a comparison with (25) yields the desired 
equality (24). 

From the dehnition of the upper bound on the transfer 
entropy rate (17) and Eq. (26) we obtain 

^ H[rt+dt\rt,mt]- H[rt+dt\rt,Xt] 

/x^y-' . (31) 

Hence, the inequality H[rt+dt\rt,mt] < H[rt+dt\rt] leads 
to 


r 


x-ly ^ 


<T^ 


(32) 


where 


7 ^= _ H[rt+dt\rt]-H[rt+dt\rt,xt] 

! x-ir — ““ ^ foOj 

Note that inequality (32) is the opposite to what happens 
to the learning rate, i.e., Zr ^ ly The chain of inequal¬ 
ities that summarizes the inequalities discussed in this 
section involving learning rate, coarse grained learning 
rate, transfer entropy rates and upper bounds on trans¬ 
fer entropy rates is given by 

Zr ^ Zy < 7)c—ir = Txy-y ^ 7”x—>y ^ "Tx—Ir- (34) 

The adaptation of the expressions from this section to the 
continuous limit, where the master equation becomes a 
Fokker-Planck equation, is presented in Appendix A. 


III. CELLULAR TWO COMPONENT 
NETWORK SENSING AN EXTERNAL LIGAND 
GONCENTRATION 


As a physical realization of a sensor we consider the 
cellular two component network sensing a fluctuating lig¬ 
and concentration shown in Fig. 3 (see [58] for a similar 


setup). The signal x is related to the external ligand con¬ 
centration s through the expression x = ln(s/so), where 
So is some base concentration value. The first layer of the 
two-component network, which is the degree of freedom 
directly sensing the external concentration, is composed 
by the receptors. Each receptor can be either bound by 
a ligand or empty, with the possible values of the num¬ 
ber of bound receptors given by rib = 0,1 ,..., Ab, where 
Nd is the total number of receptors. The second layer of 
the two-component network is composed by internal pro¬ 
teins Y that can be phosphorylated to the state Y*. The 
number of proteins in this phosphorylated form takes the 
values riy = 0,1 ,..., Ay, where Ay is the total number of 
proteins. This second degree of freedom is the memory of 
the sensor: the phosphorylation/dephosphorylation reac¬ 
tion rates depend on rib, whereas riy has no influence on 
the transition rates changing the number of occupied re¬ 
ceptors. A state of the sensor is fully characterized by 
y = (rib, riy). 

The rates with which the concentration changes are 
written as 

>»±’W = ^exp(Tgda:), (35) 

where cc is a multiple of dx and the “-I-” sign indicates 
a jump from x to x + dx while the ” sign indicates a 
jump from a; to a: — da;. As shown in Appendix A, the 
limit da; —>■ 0 yields the continuous Langevin equation 

Xt — ^xXt T 5 (3d) 

for the dynamics of the signal. The white noise Q fulfills 
the relation 

(37) 

where the brackets denote an average over stochastic tra¬ 
jectories. 

The number of occupied receptors changes with rates 
w+^ {x, rib) = {x) [Ab - rib] 

w^^\x,nd) = uj~{x)nd, 

where (x) is the rate for the binding of a ligand to any 
free receptor and oj~ (x) is the rate for the unbinding of 
a ligand from any occupied receptor. These rates fulfill 
the generalized detailed balance relation uj^ (x)/ uj~ (x) = 
exp[Z\F(a;)], where AF{x) is the free energy difference 
between empty and occupied receptor and /cbT = 1 
throughout. 

The phosphorylation reaction of a single internal pro¬ 
tein takes place with rates 

Y -b ATP ^^ Y* -b ADP, (39) 

which are proportional to the number of bound receptors 
rib- Besides this chemical reaction the internal proteins 
can also be dephosphorylated through the reaction 

Y*^Y-bPi, (40) 
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where the rates are independent of nb- The rates in 
(39) and (40) fulfill the relation h\[K+v+ / {k-V-)] = Afj,, 
where Z\/r = /tatp ~ Madp ~ MPi is the free energy liber¬ 
ated in one ATP hydrolysis. We define the total transi¬ 
tion rates for individual proteins as 


lead to the Langevin equations 
Xt — ^xXt T 

rt = -uJrirt -Xt)+ Ct 

mt = -ujn,{mt -n) + Cr 


(signal), 

(sensor), (47) 
(memory), 


Wm(nb) = nbAt+ -I- ly-, 
w“(nb) = ribK- -I- v+. 


(41) 


With these rates for the change of an individual protein 
we obtain the transition rates for a change in the variable 

Uy, 


w^+\nb,ny) = w+(nb)[AV - riy], 
w^^\nh,ny) = Lu~{nh)ny. 


(42) 


The entropy production due to the sensor jumps Cy 
has two contributions. The first is due to jumps that 
change the receptors occupancy 


(Tr 


Jr{x,nh) In 

af,nb 


w^^\x, rib -I- 1) 


(43) 


where 


where = 2DiSijS{t — t') for i,j = x,r,m. The 

variable r is related to the number of bound receptors, 
as shown in Eq. (A17), and the memory m to the num¬ 
ber of phosphorylated internal proteins, as shown in Eq. 
(A18). The precise relations between the parameters in 
these equations and the transitions rates can be found 
in Appendix A. There are three key points about these 
relations. First, for A/r = 0, i.e., without free energy 
dissipation due to ATP hydrolysis inside the cell, the 
memory becomes decoupled from the receptor and has no 
information about the signal, which in Eq. (47) implies 
t oo. Second, the noise amplitude is inversely 
proportional to the total number of receptors N^. Third, 
the noise amplitude is inversely proportional to the 
total number of internal proteins Ny. 


IV. SENSORY CAPACITY AND EFFICIENCY 
FOR MODEL SYSTEM 


Jr{x,nb) = 

P{x, nb)^ (a;, nb) - P{x, nb + l)uii^^ (a;, nb + 1) (44) 

is the probability current. The second is due to jumps 
that change the number of phosphorylated internal pro¬ 
teins 

q-m = yy Tm(nb,ny)ln - , ( 45 ) 

n;i"^(nb,ny + l) 

where 

Tm(nb,ny) = P(nb,ny)w),_Anb,ny) 

- P(nb,ny-I-l)t(;+^(nb,ny-I-1). (46) 


A. Bare sensor 

First we consider a bare sensor without memory, i.e., 
the Langevin equations (47) without the variable m. We 
use the subscript r for the sensory capacity Cr and the 
efficiency r]^ for the bare sensor of this subsection in or¬ 
der to differentiate it from the sensor with a memory 
analyzed in the next subsection. The corresponding Lya¬ 
punov equation for the covariance matrix 

S= rxr\ ^ fixtxt) (a:trt)\ 

reads [65, 66] 

S = -AS - SA^ -H 2D, (49) 


The quantity corresponds to the rate of dissipated 
heat due to binding and unbinding of ligands at different 
concentrations values. This dissipated heat is compen¬ 
sated by work that is done by the external signal. The 
quantity Uy is the rate of dissipated free energy related 
to the consumption of ATP inside the cell. Actually, 
since we are not considering each individual link with 
the phosphorylation and dephosphorylation chemical re¬ 
actions, but rather the total transition rates in Eq. (41), 
CTni is a lower bound on the rate of heat dissipated due to 
ATP consumption. A thorough discussion on the physi¬ 
cal origin of different terms in the entropy production for 
related models can be found in [55]. 

As shown in Appendix A, taking the linear noise ap¬ 
proximation and assuming a signal with small fluctua¬ 
tions, the transition rates in Eqs. (35), (38), and (42) 


where 


A = 


Wx 0 
—Wr LOr 


and D = 


Dx 0 

0 Dr 


The steady state solution of (49) is 

/ 1 


i: = £t 




I'r + l 




(50) 


(51) 


where = Dy^juiy^ is the signal variance, Vr = Wr/wx and 

Br = Dr/Dy,. 

As shown in Appendix A, the learning rate is 


Ir = Wx 


y -f Br{l + VrY 


(52) 
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FIG. 4. (Color online) Sensor performance as function of 
sensor noise = D^jD^. (a) Transfer entropy 7^_>r and 
learning rate U are displayed. The vertical dotted line at 
Br = liy^ — 1) indicates the value for which Cr = 1, i.e., 
Iv = 7^->r. (b) Efficiency (r^r = ^r/cTr) and capacity (Cr = 
Zr/7)(_s.r) of the bare sensor. At maximal capacity Cr = 1 the 
efficiency is = 1/2 and 7^->r = Tx-s-r. (c) Comparison of 
errors. For Cr = 1 the inequality fx|rt,^j < ^xjr saturates. 
Parameters: cUx = 1, = 0.1, Vr = = 10. 


The transfer entropy rate for the linear Langevin equa¬ 
tions (47) is given by [45] 


by increasing the number of receptors [see Eq. (A14)], 
implies more energy dissipation. In Fig. 4(b) the ther¬ 
modynamic efficiency is compared with sensory capacity. 
The efficiency increases with B^. For ~ l)j 

where Cr = 1, the efficiency is ? 7 r = 1 / 2 . As we show in 
in Sec. V there is a general tradeoff between efficiency 
and sensory capacity, with C = 1 implying 77 < 1/2. 

The upper bound on the transfer entropy rate, calcu¬ 
lated in Appendix A, reads 


r 


X—>-r 


4Br 



-I- 7/2 -I- i?r(l -b 


(55) 


This quantity has also been calculated in [59]. Compar¬ 
ing the upper bound with the transfer entropy rate in 
Fig. 4(b) we observe that for this model when sensory 
capacity is one we have = Tx->.r- This fact 

plays an important role in the general tradeoff between 
sensory capacity and efficiency proved in Sec. V. 

In Appendix B we define the uncertainties £x|r 
£x|rt„j about the signal given the sensor state and the 
sensor trajectory, respectively. As shown in Appendix B, 
f 2 is proportional to the transfer entropy rate Tx-n- 

^Ktraj _ 

and is proportional to the upper bound Tx->.r for 
the present model. Hence, the equality between transfer 
entropy rate and upper bound for Cr = 1 implies that 
both uncertainties are also the same, as shown in Fig. 
4(c). 


B. Memory increases sensory capacity 

For the regimes where the bare sensor does not reach a 
sensory capacity close to 1, it is possible to increase this 
sensory capacity by adding a memory to the bare sensor, 
which leads to the third equation in (47). The Lyapunov 
equation (49) for this case has the 3x3 matrices 



The learning rate and transfer entropy rate as functions 
of are plotted in Fig. 4(a). Both quantities get smaller 
as the noise amplitude of the sensor gets larger. At an 
intermediate value of B^ = 7///(7// — 1) learning rate and 
transfer entropy become the same leading to a sensory 
capacity Cj. = 1, as shown in Fig. 4(b). 

Since the bare sensor does not have a memory there is 
no ATP consumption inside the cell and the entropy pro¬ 
duction is equal to the rate of work done by the external 
signal, which, as calculated in Appendix A in Eq. ( A38) , 
is 


CFy — U}^ 


Hr(l -b 7/r) 


(54) 


This entropy production decreases with i.e., a sen¬ 
sor with smaller noise amplitude, which can be obtained 


J 

( 

0 

0 ^ 

\ 1 

Dy 

0 

0 \ 

A = 

—UJr 


0 

and D = 

0 

Dy 

0 

\ 

{ 0 

-Wm 

UJm) 

1 \ 

^ 0 

0 

Dyn 


(56) 

The stationary solution of (49) is too long to be displayed 
here. 


The expression for the learning rate ly is given in Ap¬ 
pendix A in Eq. (A44). As shown in Eq. (24), the 
addition of the memory does not change the transfer en¬ 
tropy Tx^y = Tjc-s-r which remains as given by (53). The 
coarse grained learning rate is the learning rate for the 
bare sensor calculated in Eq. (52). The quantities ly, 1^ 
and Tx^r are plotted in Fig. 5(a) as a function of the 
noise amplitude B^ = D^/Dy^. For larger values of 
the learning rate ly becomes equal to 1^, the learning rate 
does now increase substantially with the addition of a 
memory with large noise amplitude. By decreasing the 
noise amplitude ly increases until it reaches the transfer 
entropy Tx^r for small 5^. Hence, the sensory capacity 
C increases with decreasing Hm, as shown in Fig. 5(b). 
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FIG. 5. (Color online) Effect of a memory, (a) Transfer en¬ 
tropy 7)c->r, learning rate of the bare sensor and of the full 
sensor ly (including the memory) as function of the memory 
noise Bm = D^jD^. The transfer entropy estimate Tx->y 
and the learning rate ly approach 7^->r for Bm —>■ 0. (b) 

Sensory capacities C = lyjT^^v and Cr = in com¬ 

parison with thermodynamical efficiency rj = lyjay. (c) Ef¬ 
fect of memory on error. The error £x|y corresponding to 
the full sensor state approaches the minimal error for 

Bm —> 0. Parameters: Ux = 1, Bx = 10“^, Vr = oj^luyi = 10, 
Br = Dt:ID^ = 10“^, and = Wm/Wx = 'i/l + /Br ~ 100. 


The rate of free energy dissipation has now two con¬ 
tributions, i.e., (Ty = (Jr + dm- The dr given by (54) cor¬ 
responds to the work done by the external signal. The 
additional term, which is derived in Appendix A in Eq. 
(A39), is given by 


+ B,{1 + v^){l + v,)\ 

dm — Tj - ; ly - r, (57) 

Bm(l + t'm))! + t'r)(^'m + ) 

where i^m = Wm/wx- This dm is a lower bound on the rate 
of dissipated free energy due to ATP consumption. From 
expression (57), the decrease in the noise amplitude D^, 
which leads to an increase in sensory capacity, implies an 
increase in the rate of ATP consumption inside the cell. 
Adding a dissipative memory to a bare sensor can lead 
to an increase in sensory capacity. This increase corre¬ 
sponds to how much of the information about the trajec¬ 
tory is contained in the instantaneous state of 

the memory mt- 

For fixed Bm, the sensory capacity C as a function of 
z/m = Wm/wx has a maximum, as shown in the contour 
plot in Fig. 6 . Therefore, for a given Wx, which charac- 


3 


10 ^ 


10 ^ 


10 “ 


10 



8 


0 


B 


m 


FIG. 6. (Golor online) Effect of memory parameters and 
Bm on the sensory capacity. For Vm ~ \J\ + v'^ jBr — 10^ and 
Bm —>■ 0 the capacity saturates (C —>■ 1). The star (★) marks 
the parameter (i/*, B*) for which the efficiency t] is maximal 
(here rj* ~ 0.024). The remaining parameter are chosen as 
in Fig. 5. 


terizes the time-scale of changes in the external signal, 
the memory has an optimal Wm, which characterizes the 
time-scale of changes in the memory. A sensory capacity 
close to 1 is reached for small and Vm ~ \/l -I- /Br, 
as indicated by the red region in Fig. 6 . 

A larger dm leads to a lower efficiency, as shown in Fig. 
5(b). Adding a memory with a high rate of dissipation 
due to ATP consumption can increase a low sensory ca¬ 
pacity to the limit (7=1. In this case when (7=1 the 
efficiency is small due to the high dissipation of the mem¬ 
ory. For example, the maximal efficiency that is achieved 
in the region plotted in Fig. 6 is 77 ~ 0.024. In this regime 
of high internal dissipation the efficiency does not seem 
to be a relevant quantity to characterize the performance 
of the sensor, which is rather given by sensory capacity. 

As shown in Appendix B, for a sensor with a memory, 
the uncertainty taking the instantaneous state of the sen¬ 
sor into account is proportional to the upper bound on 
the transfer entropy rate. As is the case of the transfer 
entropy, the uncertainty taking the full time series of the 
sensor into account does not change with the addition of 
the memory. Therefore, also for the present case (7=1 
implies that both uncertainties are equal, as shown in Fig 
5(c). 


V. TRADEOFF BETWEEN SENSORY 
CAPACITY AND EFFICIENCY 

A. Tradeoff for model system 

There are two situations for which the maximal sensory 
capacity (7=1 can be reached. Either the parameters 
related to the signal and the first layer of the sensor are 
chosen in such a way that there is no further information 
in the trajectory {rt'}t'<t as compared to the instanta- 
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0.2 0.4 0.6 0.8 

efficiency rj 


FIG. 7. Trade-ofF between capacity C and efficiency r;. The 
parameters for the bare sensor i/r and are chosen at random 
with 10“^ < Ur,Br < 10^. For the sensor with memory, in 
addition, the parameters Vm and B^ are chosen in the same 
way. The solid lines indicate the bounds 4 ? 7(1 — 77 ) < C < 
for bare sensor. Our numerics indicates that the 
upper bound C < 2yjr){l — rf) is also valid for the sensor with 
memory for rj > 1 / 2 . 


neons state rt or a dissipative memory is added to the 
sensor. In the first case, the efficiency is 77 = 1/2 for 
C = 1 and in the other case ry < 1/2 due to the extra 
dissipation inside the cell. 

The tradeoff between sensory capacity and efficiency 
for the model system in Eq. (47) is shown in Fig. 7. For 
the bare sensor we obtain the bounds 


4?7r(l - rjr) < Cr < 2y'r]r{l - ryr), (58) 


which are derived in the following way. From (52) and 
(54) the efficiency reads 


Ir _ i?i.l7r(l + t'r) 

CTr 1/2 + i?r(l + t'r)^ ’ 


(59) 


and from (52) and (53) the sensory capacity reads 


a = 


2i/3 


Tx- 


[i// + B,{1 + r//)] [y/l + 7 / 2 /B, - 1] ■ 


(60) 


The upper (lower) bound in Eq. (58) is obtained by 
maximizing (minimizing) the capacity (60) with respect 
to the variables > 0 with the constraint that (59) 

is fixed. Most prominently, the scatter plot in Fig. 7 
shows that the upper bound in Eq. (58) also applies for 
the full sensor with a memory in the region 77 > 1 / 2 . 


B. General proof 


77 < 1/2. Our proof depends on the reasonable assump¬ 
tion that for any sensor it is possible to create a fictitious 
memory such that the instantaneous state of the fictitious 
sensor, composed of the sensor and the fictitious mem¬ 
ory, contains the whole history of the sensor. From the 
calculations for the model system in Sec. IV, we expect 
this fictitious memory to have two general characteris¬ 
tics. First, it must be precise. For the model system this 
precision is characterize by a small in Eq. (47), which 
can be achieved for the case the total number of proteins 
inside the cell is very large, i.e., the memory has a large 
number of possible states. Second, the time scale for 
changes in states of the fictitious memory must be tuned 
to some optimal value. For the model system this time 
scale is characterized by Wm in Eq. (47). For a system 
that is more elaborate than our model system one can 
think of a multicomponent memory with the time-scale 
of each component optimally tuned to store information 
about a certain part of the sensor. 

From the chain of inequalities, summarized in (34), 
adding the memory raises the learning rate and lowers the 
upper bound on transfer entropy rate. In a first step, we 
impose that (i) (7 = 1 and (ii) that the transfer entropy 
rate is equal to the upper bound, i.e., ly = Tx^y = Tx-fy 
From relations ( 8 ) and (18) we obtain 


7”x->.y ly 

= ^( 7 ) P(^\y)^yy' In 

y,y' X 


P{x\y) 


P{x\y')w. 


( 61 ) 


yy 


where the log sum inequality above is saturated if and 
only if the term inside the logarithm is independent of x 
[67]. Hence, if Tx^y = ly then the rates obey 


w 


X 

yy' 


P{x\y') 

P{x\y) 


W 


yy 


(62) 


With this restriction, Eq. ( 8 ) and Eq. (20), the entropy 
production (15) becomes 


cr 


y 


— 2ly + tfy. 


(63) 


The efficiency (16) then reads 



1 CTy — q-y 

2 CTy 


< 


1 

2 ’ 


(64) 


where we used CTy > cfy. Hence, if C = 1 and T^-ry = 
Tx->.y, the efficiency fulfills rjy < 1 / 2 . 

We now demonstrate that (7 = 1 indeed implies 
Tx^y = Txy-y, which completes the proof of the trade¬ 
off. A fictitious memory a is added to the sensor y. The 
transitions rates are now of the form of Eq. (22) with y 
replacing r and a replacing m. The learning rate of this 
fictitious sensor composed of z = {y, a) reads 


We now prove as a general trade-off between sensory 
capacity and efficiency: a sensory capacity (7=1 implies 


k 


P{x,y,a)w^^ In 

x,x' ,y,(y 


P{x,y,a) 

P{x',y,a)’ 


(65) 
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where we used Eqs. (10) and (11). Within this ficti¬ 
tious sensor ly is a coarse-grained learning rate and the 
difference between and ly reads 


x,x' ,y 


'^P{a\x,y) In 


P{a\x,y) 

P{a\x',y) 


> 0 . 


( 66 ) 

The assumption C = 1 implies ly = l^. The 
above inequality is saturated if and only if 
P{a\x,y) = P{a\x',y) = P{a\y), yielding 
Pix\y,a) = = Pix\y). This rela¬ 

tion leads to 


H[xt\yt,at] = H[xt\yt]- (67) 

The fictitious memory a is unspecified and the key as¬ 
sumption for our demonstration is that it is always pos¬ 
sible for any sensor y to find a fictitious memory a that 
fulfills the relation 

H[xt\yt,at] = H[xt\{yt'}t'<t]- ( 68 ) 

If we choose such fictitious memory then equality (67) 
leads to 


H[xt\yt] = H[xt\{yt'}t'<t]- (69) 

Hence, if it is possible to find a fictitious memory that 
fulfills (68), then (7 = 1 implies (69). From (69) we obtain 
I[xt:{yt'}t'<t] = I[xt:yt,yt-dt] = I[xt--yt]- The learning 
rate in the form (11) can be rewritten as 

, _ l[xt-yt] - l[xt+dt-yt] 
dt 

_ l[xt+dt'yt+dt] - l[xt+dt-yt] 
dt 

_ i[xt+dt-yt+dt-! Vt] i[xt+dt-yt\ (70) 

dt ’ 

where we used the steady state property I[xt+dt'yt+dt] = 
I[xt.yt] from the first to the second line. Inserting the 
conditional probabilities in terms of rates from Eq. (7) 
into Eq. (70), leads to the completion of the proof, i.e., 

^ _ 

ly= 

Wqiy' 

x,y,y' 

Summarizing, we have demonstrated that (7 = 1 => 
H[xt\yt] = H[xt\{yt'}t'<t] ^ ly = Tx-i-y => C = 1. This 
proof also implies that whenever (7 = 1 then the up¬ 
per bound is also equal to the transfer entropy rate, i.e., 
ly = Px^y = Tx^y For the coupled linear Langevin 
equations analyzed in Sec. IV this equality between 
transfer entropy rate and its upper bound implies the 
equality between the uncertainty about the external sig¬ 
nal that are estimated with the instantaneous state of the 
sensor and the uncertainty that is estimated with the full 
time series of the sensor, as shown in Appendix B. For 
general systems, it remains to be seen whether (7 = 1 
implies that both uncertainties are the same. 


VI. CONCLUSION 


We have introduced the quantity sensory capacity, 
which provides a measure for the performance of a sensor 
that follows an external signal. Specifically, the maximal 
sensory capacity (7 = 1 means that the instantaneous 
state of the sensor contains the same amount of informa¬ 
tion about the signal as the full time-series of the sensor. 
As we have shown with the coupled linear Langevin equa¬ 
tions in Sec. IV a high sensory capacity can be achieved 
in two ways. First, for a bare sensor without a memory 
layer the parameters related to the sensor can be tuned 
in such a way that (7=1. In this case there is no fur¬ 
ther information available in the full time series of the 
degree of freedom directly sensing the signal. Second, 
the more interesting case is when the full time series of 
this first degree of freedom has more information than 
its instantaneous state. By adding a memory, which is a 
second degree of freedom that is influenced by the first 
degree of freedom but does not react back on it, the sen¬ 
sory capacity can be raised to (7 = 1. This increase in 
sensory capacity quantifies how much information about 
the time-series of the sensor is stored in the instantaneous 
state of the memory. 

The coupled linear Langevin equations have been de¬ 
rived from a cellular two component network sensing 
an external ligand concentration, which is the signal. 
Within this physical realization of a sensor the first layer 
of the sensor are the receptors that bind external ligand 
and the memory is composed of internal proteins that 
can be phosphorylated. We have shown that the ther¬ 
modynamic entropy production quantifying dissipation 
has two terms: work done by the external process due 
to binding and unbinding at different concentrations and 
dissipation inside the cell due to ATP hydrolysis. Adding 
a memory that increases the sensory capacity of a sen¬ 
sor from a low value to a value close to one requires a 
high rate of dissipation inside the cell. Sensory capacity 
is particularly interesting in this regime of high dissipa¬ 
tion, where the efficiency is very low and, therefore, does 
not characterize well the performance of the sensor. 

Finally, we have demonstrated a general tradeoff be¬ 
tween sensory capacity and efficiency. A sensory capacity 
(7 = 1 implies an efficiency y < 1/2. The limit ry = 1/2 is 
achieved for a bare sensor with its parameters optimally 
tuned so that (7=1. If these parameters are not opti¬ 
mally tuned, (7 = 1 is possible only with an additional 
memory that leads to extra dissipation in relation to the 
bare sensor, which implies rj < 1/2. 

This tradeoff relation between the two bounded di¬ 
mensionless quantities (7 and t] provides a further link 
between information theory and thermodynamics. The 
sensory capacity (7 as a ratio between learning rate and 
transfer entropy rate is of purely information theoretic 
origin whereas the efficiency ry as a ratio between learn¬ 
ing rate and entropy production contains input from both 
fields. As a perspective for future work, the role of non- 
linearities in these figures of merit could be explored in 
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more complex models. 

An experimental realization verifying the second law 
for a sensor that involves the rate of dissipated heat and 
the learning rate is still lacking. A good candidate for 
such an experiment is a colloidal particle, which is the 
sensor, subjected to an external potential that is varied 
stochastically. An experiment with a sensor that has an 
internal memory seems to be even more challenging. 


Within the Ito interpretation [65, 66], the Fokker-Planck 
equation (A3) corresponds to the Langevin equation 

= D,{zt)F,{zt) + ^l, (A7) 

where {m{i) = 2Di{z)Sij6{t — t'). The Sij term in this 
last equation is a direct consequence of the bipartite (or 
multipartite) structure of the transition rates. 


Appendix A: Prom Master Equation to Langevin 
Equation in bipartite processes 


2. Two component network with a weakly 
fluctuating signal 


1. Linear noise approximation 

We consider a vector z = {zi,, zj) determining the 
state of the system. Comparing with Sec. II, the first 
component is related to the signal, i.e., zi = x. The 
other components are related to the sensor. If the sensor 
has only one component r then Z 2 = r. A sensor with a 
memory also has a second component y = (r,m), lead¬ 
ing to Z 3 = m. For the variable zi = x we denote the 
transition rate = uj^\z) for x' = x± da;, where da; 
corresponds to an infinitesimal change in the variable x. 
The master equation is written as 


The linear noise approximation for the specific model 
of Sec. Ill is valid in the limit Ny,N^, I and da; —0. 
In this case, from the transition rates (35), (38), and (42), 
the Langevin equation (A7) becomes 

— ^yiXt , 

hb(t) = a;+ {xt)Nb - [w+ (xt) + ^^(a;*)] nb(t) -f 
ny{t) = uj+{nb{t))Ny - [w+(nb(t)) -f a;-(nb(t))] ny{t) 

+ er. (A8) 

From Eq. (A6) , the noise terms and fulfill a relation 
similar to (37), with amplitudes 


i=l 

d 


+ 


i=l 


- dz,)P{z - dz,) - w^^\z)P{z) 

Db{x,nh) = i 

w^^\z + dZi)P{z + dZi) — wf’ {z)P{z) 

Dy (rib, ny) = — 


(AI) 


With the approximation 


(A9) 


respectively. 

If the fluctuations of the signal are small such that x 
stays close to the value a; = 0 we can apply the following 
expansion 


w^^\z =F dzi)P{z =F dzi) ~ w^±\z)P{z) 

T dz~w^l\z)P{z) + ^dZi^w^±\z)P{z), (A2) 

the master equation (Al) turns into the Fokker Planck 
equation 

(A3) 

i 

where in the continuous limit P{z) p{z)Y\^dzi. The 
probability current reads 


NbUjf{x)/[ijj^{x)-\-uj^ (a:)] = rib-f aix-I-0(a;)^ (AIO) 

where = A^bW+(0)/[w+(0) -|-a;“(0)] and ai is the first 
derivative evaluated at a; = 0. For rih — small, 

A(yW+(nb)/[a;+(ub) + w“(nb)] = n* + a2(«b - «£) 

+ 0(nb — Ub)^ (-^11) 

where n* = Aya;+(nJ)/[a;+(n[;) -|-a;“(n[;)] and a 2 is the 
first derivative evaluated at Ub = ?^b- limit where 

Eqs. (AIO) and (All) are valid, the Langevin equations 
(A8) become 


Ji{z) = D,{z)Fi{z)p{z) - —Di{z)p{z), 

OZi 


where 


and 


Di{z)Fi{z) = dZi w''l'{z) - {z) , (A5) 


D,{z)^'^[wf{z) + w''^{z) 


(A4) 


(A6) 


Xt — ^xXt F ^t 
7ib(t) =uJr nl + aiXt - nx,{t) 


+ 


(A12) 


ny{t) = Wm riyF a 2 {nh{t) - nl) - ny{t) 




where 


Wr = Wj]"(0) -|- Wj. (0) 

Wm =a;+«)-ha;^(n;). 


(A13) 
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Furthermore, the noise amplitudes in Eq. (A9) become 


D;^Dy{nl,n;) = ^n;{Ny-n;). 

The explicit form of the parameter ai in (AlO) is 
n^iNh - n^) dAF{x) 


(A14) 


ai = 


iVb 


dx 


(A15) 


and 02 in Ed- (All) is 

n*{Ny - n*) 


02 = 


iVv 


K+Z/+ — K-V- 


(n*K+ + v_)(nlK- + v+) 


(A16) 


as obtained from (41). Hence, for Z\/i = 
hi[K+Vj^/{ k_v_)] = 0 this last parameter is 02 = 0, i.e., 
the memory level in Eq. (A12) is not affected by the num¬ 
ber of occupied receptors. Therefore, ATP consumpation 
is necessary in order for the memory to be able to store 
information about the signal. 

The linear Langevin equations can be further simplified 
with the transformations 


^ ^ nb(t) - nl 

Oi 


(A17) 


and 


ny{t)-n* 

rrit = -^ 

0 : 10:2 


(A 18 ) 


With these variables the Langevin equations (A12) be¬ 
come Eq. (47), with the noise amplitudes (A14) trans¬ 
formed to 


D, = D*y/al 
D^ = D;/ia,a2f. 


(A19) 


Learning rate - From Eqs. (A2) and (A4) the learning 
rate (8) becomes 

ly = J dx J dr J dmJr{x,r,m)—liip{x\r,m) 

+ J dx J dr J dmJ^{x,r,m)— — h\p{x\r,m), 


(A23) 


where p{x\r,m) = p(a;, r, to)/[/ p{x,r,m)dx\. This ex¬ 
pression can also be found in [46], where the learning 
rate is called information flow. Integration by parts and 
the steady state property dxJ^ -\- dr Jr -\- dmJm = 0 leads 
to the alternative expression 

f d 

/ dTnJx{s,r,m)—\np{x\r,m). 

(A24) 

Coarse grained learning rate — The coarse grained 
learning rate in Eq. (23) becomes 



Ir 



dr Jr (a;, 



In p{x\r), 


(A25) 


where Jr{x,r) = J dmJr(x,r,m), p{x,r) = 
f dmp(x,r,m) and p(x\r) = p(a;,r)/[/ p(x,r)dx]. 

Entropy production - The entropy production in (15) 
is separated into two contributions 


CTy = (Tr + O-jn, (A26) 

as shown in Eqs. (43) and (45). In the continuous limit, 
using Eqs (A2) and (A4), these contributions become 


CTr = 


dx 


dr 


dmJr(x, r)Fr(x, r). 


(A27) 


and 



dr 


dTOJ„i(x, r, m)Fm(r, to). (A28) 


3. Quantities in the continuum limit 


We consider a vector (zi, Z 2 , -^ 3 ) = (a:, r, to) with tran¬ 
sition rates 


,(i) 


( \ 


( 2 ) / ^ 

(^)= ^exp 
(3) / \ _ 


± 


Fx{x)da 


± 


Er(x,r)dr 


± 


Fin(r, to) dm 


(A20) 

(A21) 

(A22) 


where the the diffusion constants Di are assumed to 
be independent of (x,r,m). The following relations are 
obtained by taking their expressions for the discrete 
case in Sec. II and then taking the continuous limit 
(dx, dr, dm) —)■ 0, where the probability is replaced by 
a density, i.e., P(x,r, to) —>■ p(x, r, TO)dxdrdTO. 


Coarse grained entropy production - From Eqs. (A2), 
(A4) and (A28), the coarse grained entropy production 
(20) becomes 


dr J dm J Jr(x,r, TO)da 


Ei.(x, r)p(x|r, TO)dx 
+ (A29) 


The last term am remains the same because to is not 
directly influenced by the signal x. 

Upper bound on transfer entropy rate - The upper 
bound of the transfer entropy rate (18) becomes 

7” x->.y = 

-A J dx J dr J dmp{x,r,m) Fr{x,r)^ — Fr{r,m)'^ 

(A30) 
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where we used the averaged force 


Eqs. (A34) and (A35) yield 


Fr(r, m) 


dxp{x\r, m)Fr{x, r). 


(A31) 


Since, Fy^(r,m) = Fm{r,m) the contribution due to m is 
zero. For Tx->.r defined in Eq. (33) we replace p(x|r, m) 
by p{x\r) in Eqs. (A31) and (A30), which leads to the 
expression 


X—>r — 


D, 


dx J drp{x,r) F^{x,r)^ — F,.{r)^ , 

(A32) 

where Ei.(r) = f dxp{x\r)F,-(x,r). 


4. Gaussian linear processes 

We now consider a linear Langevin equation of the 
form 


$ = [A - DS-i] = ASA^D 1 -DA^D \ 

(A37) 

where we used the fact that p(x, y) is a multivariate 
Gaussian density. With this expression, from Eq. (A27) 
we obtain 


and from Eq. (A28) we obtain 


+ B,{1 + y^){l + V,)] 

where = Dy^jui^^v^ = ujyiu}^,B^ = D^/Dy^, 
Wm/wx,7?m = Dya/Dy (as defined in Sec. IV). 

The gradient of the log of the density reads 


(A39) 

t'm = 




a(x,y) = - lnp(x,y) = S ^ ■ 


(A40) 


where ) = 2Di5(t — t'). The matrices A and D for 
the bare sensor y = r are given by (50) and for the sensor 
with a memory y = (r, m) they are given by (56). The 
steady state solution of this Langevin equation is a mul¬ 
tivariate normal distribution p(x, y) with zero mean and 
covariance S, which is the stationary solution of (49). 
Comparing Eqs. (A7) and (A33) the drift term is 

F(x,y) = -D^iAQ. (A34) 

The probability current defined in Eq. (A4) is then given 

by 


J(x, y) = - [A - DS-i] Q p{x, y), (A35) 

where is the inverse of S. 

We define the matrix 

^ = j dx J dyJ(x,y)F(x,y)^. (A36) 


With the matrix 


L = J J dyJ(a;,y)a(x,y)^ 

= -(A - DS-i)SS-i = -A + DS~\ 


(A41) 


where we used Eqs. (A35) and (A40), the learning rate 
ly = Lxx (A24) reads 


ly Lxx ^X 


-i + 4^(s 


yxx 


(A42) 


The 2x2 covariance matrix of (x,r) given by (51) yields 


/r — Lxx — 


+ i?i.(i -I- 


(A43) 


For a the case with memory, where (x, y) = (x, r, to), the 
explicit form of the learning rate (A42) is given by 


ly - Lxx - 


Wxt-r (l^m + Vr) (t'mt'r 4- 1) -|- [B^n (t'm + 1) ^ ] 


[^m + (dt'r -|- 2) -|- -|- 2Vy -|- 2] -|- B^ (v^ "I" 1) ^ {Vy -|-l)^-|-r'^}-|- B^ (iZm -|- 1) ^ [By (Vy -|- 1) ^ iz^] {l/yy^ -\- Uy) ^ 

(A44) 


The upper bound on the transfer entropy rate (A30) reads 


(A45) 
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where {x\r,m) = f p(a;|r, m)i di and we used F^{x,r) = 
U!r{x — r)/ D-c- 


Appendix B: Uncertainty from instantaneous state 
and from time-series 


(Bl), and Eq. (B3) we obtain 


Tx-).y — 


2 

'^X|y 

4Sr f 2 ■ 


(B6) 


We first consider a sensor with memory y = (r, m). 
The covariance matrix, which is the stationary solution 
of (49) with matrices given by (56), is written as 


/ T'xx 

X'xr 

'^xm \ 



S = 1 Axr 


Urm j 

Uxm 

^rm 

^mm. j 

\b 



(Bl) 


The linear estimate of x from y is x{y) = c^y, where c 
is a vector. Minimizing the variance 


The best estimate Xt that uses the time-series of the 
sensor {rt'}t><t to minimize the uncertainty = {{xt — 
XtY) is known as the Kalman-Bucy filter [45, 68]. For 
the linear Gaussian process from (47) the best estimate Xt 
satisfies {rt>Xt) = (rt'Xt) for all t' <t and {xt{xt — Xt)) = 
0 (see [68]). It can be shown that the minimal error 
satisfies the Riccati equation, which reads [45, 68] 


([x — x(y)]^) = — 2c^b + c^Sc, (B2) 

which is minimal for c = S b, leads to the uncertainty 

4% = £2 - b^S-'b = £2 1^1 _ . (B3) 

Following the same procedure for a bare sensor with y = 
r, fl — and b = ifxr = the covariance matrix 
(51) leads to an uncertainty 


The stationary solution of this equation gives the uncer¬ 
tainty about the signal given the sensor trajectory 


c2 _ ^2 

‘^X|rtraj 

\1 -b + 
Comparing with Eq. (53) we obtain 


'Tx.-^r — 


2 

^x|rt,aj 

4Sr £2 


(B8) 


(B9) 


— F^ 

^x|r 


1 - 


R,(l 




(B4) 


Comparing Eq. (55) with Eq. (B4) we obtain 


7” X—— 


2 F'^ 

^x|r 

4B, £2 ■ 


(B5) 


Likewise, from Eq. (A45), with p(x,r,m) a multi- 
variative Gaussian with zero mean and covariance matrix 


The simple relations (B5), (B6), and (B9) are valid for 
our model system that corresponds to a linear Gaussian 
process. Since for C = 1 the transfer entropy rate equals 
its upper bound, for our model system a maximal sensory 
capacity (7 = 1 implies £x|rtraj = ^x|y In this case the 

linear estimate x{y) = c^y = b^S y from Eq. (B2) 
coincides with the estimate from the Kalman-Bucy fil¬ 
ter Xt, which is similar to the finding in [45] for optimal 
feedback cooling. 
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